-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maven-like repository for JavaCPP resources #120
Conversation
Sorry for taking some time to reply. I was worried this wouldn't work on some platforms, so I've been testing this out on all platforms, and (apart from an obvious bug) it seems to be working surprisingly well everywhere. Good job! Thinking about this some more though, it looks more like a cache than a repo, no? If we consider it as a cache, we wouldn't need any coordinates. Do we need "clean" paths? |
Since this new code is basically getting used only when loading files found in a JAR archive, what if we simply used the name of the archive? So that files found in In any case, there's at least one issue that we need to resolve. We'll need to make sure that the files in the cache/repo are the same as the ones in resources, and if not overwrite them. Just a call to |
To check for modified files, maybe we could just do like rsync? "Rsync finds files that need to be transferred using a lqquick checkrq algorithm (by default) that looks for files that have changed in size or in last-modified time." |
As you know once the second PR is ready, you won't need to install anything on your machine before you can build native code. I think your project will enable all kinds of apps to add little bits of native code, as it will be really convenient to do. The only way I think to make sure we don't have name conflicts and proper versioning across a large number of apps is to closely stick to Maven coordinates. They provide global unique names, versions etc. and Maven files cannot be changed. Checking for content should only be necessary for *-SNAPSHOT jars. Maybe we can leverage their checksum mechanism too, or do what most Maven tools seem to do for SNAPSHOT versions which is to always replace them? |
Extracting files in the tmpdir already allows to have a large number of applications running without conflicts, so it's not a good reason to change that behavior. There are issues with temp files that can be dealt with using a cache, without adding a lot complexity, because it's basically transparent, so that's good I think. But what you propose isn't transparent. We would have cases when what we have in Java resources doesn't match what we have in JavaCPP resources, and that would require some tools to deal with. Both Maven and Nexus do allow redeploying artifacts with non-SNAPSHOT versions, for debugging, recovering from failed operations etc. I personally wouldn't be ready to deal with the added complexity of an additional repo, unless the tools are there and there's a clear advantage over something like a cache. BTW, simply overwriting files wouldn't work well when they are already opened by another process. Applications could fail for obscure reasons. For a cache, a hash based on the content of the JAR file like you propose is good, but we can use nicer names based on the name, date, and size of the JAR, which should avoid conflicts, but would require a max cache size option and some appropriate cleanup a la ccache. |
I like the simplicity of a pure cache story, but it might be limiting when using it to build against it. E.g. if you are playing with custom builds, or runtime code generation etc, it's handy to have a fixed path to include. Each library version has a single path it can be found at, and that can be relied upon to be there. I think it's worth the small added complexity of fixed location, plus checking if the files have changed. I looked about this a little, Maven does let you republish files, but it seems really exceptional and something devs only would play with in practice. It seems maven does an additional check for the timestamp, in addition to testing for the file existence. We could replicate this by looking at the timestamp in .m2, and re-expanding if changed? |
I added support for detecting Maven re-deployments. If the timestamp in .m2 is higher than ours, the jar is expanded again. Would that work? Do you agree with my argument about stable paths to native libraries being very valuable? Thanks |
Something like that would work, yes, but it's not enough. rsync checks for both the size of files and their last modified dates. So, in this case as well, let's make sure that a file already extracted has the same last modified date and file size than one found in the JAR file. Then we can reasonably assume the content is the same, and I would be fine with that. |
Another important thing that we'll need to fix is Android support. Android doesn't support the FileSystem API from Java 7, but it has its own buggy ways of dealing with JNI. Anyway it doesn't matter. We just need to make sure things like ClassNotFoundException, NoSuchMethodError, and such are handled properly, as is already done on these lines: |
Instead of some cryptic hash, what about if we use the following mapping for resources without coordinates? |
Yes that sounds good, I added a commit to remove the hash. About android support, I'm not sure what to do. If I add try-catch blocks, I still need to fail further along, as the files would not be deployed. And about the file size check, I don't think it is necessary. There is no way the maven repo can be updated without the timestamp changing. Also I don't see a clean way to implement it, as I need to compare the size of a jar to a expanded folder. Would the timestamp only work? thks |
Thanks! Could you also make sure the extracted files have the same size and same last modified date as well? Just checking that files in JAR files are more recent isn't enough... As for Android, we don't need to extract the files for Android. They get extracted at install time by the Android installer, and in theory, we only need to have the fallback |
BTW, I'm not talking about the size and the date of the JAR files, but of the files in the JAR files. From what I understand, we would be adding the check here: If, for example, the .m2 repository gets corrupted, the last modified dates jump into the future, but then restored from a backup, of course the last modified dates can go backwards. Like I said previously, if we are to consider this a cache, we need to make sure that what's in it isn't out of sync with the actual repository, with reasonable certainty. What rsync does by default appears to work for most people so let's just do the same! On the other hand, if we consider this a repository, we are going to need more tools to deal with these inconsistencies that will arise in various situations that are going to happen in production (Murphy's law). For example, how could we "deploy" the files without running an application that uses JavaCPP? These kinds of questions don't matter if this is just a cache. |
I also don't want this to be a full repo, definitely more like a cache. The only thing is that I would like deterministic paths to content, so that we can build code against it. Initially I thought having its own root in home would make it easier to do that, but maybe not. Moving it to .cache/javacpp as you suggest should be OK, I don't see people clearing their cache while an application is running or messing with the content. For content validation, I still don't see the value for the additional complexity that would incur, and the fact that every launch will require opening and listing the full directory of each jar in the dependency tree even if one file is actually used etc. I can write it, but it will take time and I don't see the point. You mention the scenario where someone would restore their .m2 from backup but not the rest of their home folder. That's already a bit of a stretch, but even there it can work with time stamps. When extracting the jar, we set the modified time on the folder to the one of the jar. If it doesn't match for any reason we re-extract, whether it is in the future or in the past. I will add the Android try/catch blocks and look at the emulator. Do you have scripts for that already to save time, or not yet? Thanks |
The more I think about this, the more it seems like we're trying to do something that's way more complicated than it needs to be. Native code cannot easily access Java resources found in JAR files, and that's basically the only problem that needs to be solved, correct? When the resources are actually files on the native file system, we don't need to do anything other than to pass in the right path. We don't want to copy them, right? Only when they are in JAR files do we need to extract them somewhere, and IMO that would be the only thing that JavaCPP's cache needs to do. To get consistent naming, we can just use the name of the JAR file they were extracted from. Since we assume we are using Maven, there are basically two scenarios:
Still, it's conceivable that multiple JAR files with the same name but with different content get used at the same time by different processes, but if that becomes a problem, we can easily extend the simple scheme above by appending say the size and the last modified date of the JAR file to the subdirectory name in jarcahe, and just create a symbolic link to the latest one. We can get back deterministic names that way. But I don't think we will need to do that, so let's not worry about that for now. It's something we can add later if needed. In any case, since we'd be caching on a per JAR file basis, I guess we could validate only with the last modified date on the subdirectory, something that wouldn't be possible if we tried to map uber JARs to some coordinates, in which case we would have to check each file one by one. More importantly, we shouldn't use the FileSystem API. This way, we could use JavaCPP on Android also to extract resources in a portable way. If you think that what I describe above would meet your needs, that's something I'd be ready to code from scratch myself, but let me know. If it does not though, could you make your requirements more explicit and why they are not satisfied? (Not so sure anymore about using ".cache/javacpp". The actual cache directory depends on the OS and sounds like it would make things confusing, http://wiki.netbeans.org/FaqWhatIsUserdir , for example, and then there is Android's own scheme. Besides, some tools like ccache still don't use it.) Scripts for the Android emulator? I'm afraid it's a lot of manual installation of everything. Google does not make Android amenable to automation. :( |
Supporting uber jars might be useful. My use case is covered with just expending jars with coordinate attributes, so either way for me. If you feel like re-writing it, as long as it has include file support and stable paths I'm happy. I probably have already invested an unreasonable amount of time in this feature. I just like the long term potential of being able to compile native code without having to install anything, or even at runtime. I want to have it, and I don't really care about the specifics of how it works. What would be your time estimate for re-writing it? thks |
It's not about supporting uber JAR or not supporting uber JAR, it's about not breaking existing functionality... |
I've finally been able to spend a bit of time on this. Please check the It currently doesn't support caching directories, yet. When the "resource" is a directory, we would need to add special support, and using the FileSystem API would be fine. In any case, BTW, on Windows, we would also need to bundle the |
OK that sounds good, I like the class + "include" API. As I mentioned before we need to make sure the paths will be predictable, so that we could hard-code if needed, e.g. in a build script outside of the JVM. Probably only needed for development but I'm sure other uses cases will popup as we go. |
As long as the JAR files in the class path are taken from the local Maven repository, they will be predictable... Will that always be the case for what you are anticipating to do? |
Yes, that should be fine. |
We now have |
I have a need for #43 again, so I resumed working on it. To help merge it, I decided to split the old PR into two. This first part is the repo itself, without the include files jar or symlinks support. While we iterate on this, I will rebase and work on the rest of the old PR.